Neural Radiance Fields (NeRFs) are coordinate-based implicit representations of 3D scenes that use a differentiable rendering procedure to learn a representation of an environment from images. This paper extends NeRFs to handle dynamic scenes in an online fashion. We do so by introducing a particle-based parametric encoding, which allows the intermediate NeRF features -- now coupled to particles in space -- to be moved with the dynamic geometry. We backpropagate the NeRF's photometric reconstruction loss into the position of the particles in addition to the features they are associated with. The position gradients are interpreted as particle velocities and integrated into positions using a position-based dynamics (PBS) physics system. Introducing PBS into the NeRF formulation allows us to add collision constraints to the particle motion and creates future opportunities to add other movement priors into the system such as rigid and deformable body constraints. We show that by allowing the features to move in space, we incrementally adapt the NeRF to the changing scene.
translated by 谷歌翻译
Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.
translated by 谷歌翻译
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.
translated by 谷歌翻译
我们表明,如果考虑密度感知的认知不确定性项,则有效地量化神经辐射场(NERF)中的模型不确定性。在先前的工作中调查的幼稚合奏简单地渲染了RGB图像,以量化因观察到的场景的解释而引起的模型不确定性。相比之下,我们还考虑了各个射线沿线的终止概率,以确定认知模型的不确定性,因为对训练过程中未观察到的场景部分的知识不足。我们在NERF的既定不确定性量化基准中实现了新的最先进的性能,优于需要对NERF架构和培训制度进行复杂更改的方法。我们此外表明,可以将NERF不确定性用于次要视图选择和模型改进。
translated by 谷歌翻译
许多高性能作品在分布外(OOD)检测方面使用真实或合成生成的异常数据来正式化模型置信度;但是,它们通常需要重新培训基本网络或专门的模型体系结构。我们的作品表明,嘈杂的嵌入式在OOD对象​​检测的挑战领域中使异常值(Nimgo)成为了很大的异常值。我们假设合成异常值只需要最小化分布(ID)数据的扰动变体即可训练一个歧视器以识别OOD样本 - 而无需昂贵的基本网络重新培训。为了检验我们的假设,我们通过在图像或边界盒级别上应用添加剂噪声扰动来生成一个合成的离群值。然后,对辅助功能监视多层感知器(MLP)进行训练,以使用扰动的ID样品作为代理来检测OOD特征表示。在测试过程中,我们证明辅助MLP将ID样品与最新水平的OOD样品区分开在OpenImages数据集中。广泛的额外消融提供了支持我们假设的经验证据。
translated by 谷歌翻译
我们引入强大的想法,从超比计算到有挑战性领域的分布外(OOD)检测。与基于单个神经网络的单层执行的大多数现有的工作相比,我们使用相似性的半正交投影矩阵来将来自多个层的特征映射投影成公共矢量空间。通过反复应用捆绑操作$ \ oplus $,我们为所有分布类创建特定于特定于特定于特定的描述符向量。在测试时间时,描述符矢量之间的简单高效的余弦相似性计算一致地识别具有比当前最先进的性能更好的ood样本。我们表明,多维网络层的超级融合对于实现最佳的普遍表现至关重要。
translated by 谷歌翻译
虽然深度加强学习(RL)代理商在获得机器人学的灵平行为方面表现出令人难以置信的潜力,但由于培训和执行环境之间的不匹配,它们倾向于在现实世界中部署时出现错误。相比之下,经典的机器人社区开发了一系列控制器,可以在真实的推导下,在现实世界中的大多数州都可以安全地操作。然而,这些控制器缺乏对分析建模和近似的局限性的复杂任务所需的灵活性。在本文中,我们提出了贝叶斯控制器融合(BCF),这是一种新颖的不确定性感知部署策略,这些策略结合了深度RL政策和传统手工控制器的优势。在本框架中,我们可以执行零拍摄的SIM-Teal Transfer,其中我们的不确定性的配方允许机器人通过利用手工制作的控制器来可靠地在分配状态下行动,同时获得所学习系统的灵敏度。我们在两个现实世界的连续控制任务上显示了有希望的结果,其中BCF优于独立的政策和控制器,超越了可以独立实现的。在HTTPS://bit.ly/bcf_deploy上提供演示我们系统的补充视频。
translated by 谷歌翻译
部署到开放世界中,对象探测器容易出现开放式错误,训练数据集中不存在的对象类的假阳性检测。我们提出了GMM-DET,一种用于从对象探测器中提取认知不确定性的实时方法,以识别和拒绝开放式错误。 GMM-DID列达探测器以产生与特定于类高斯混合模型建模的结构化的Logit空间。在测试时间时,通过所有高斯混合模型下的低对数概率识别开放式错误。我们测试了两个常见的探测器架构,更快的R-CNN和RETINANET,跨越了三种不同的数据集,跨越机器人和计算机视觉。我们的结果表明,GMM-DET始终如一地优于识别和拒绝开放式检测的现有不确定性技术,特别是在安全关键应用程序所需的低差错率操作点。 GMM-DET保持对象检测性能,并仅引入最小的计算开销。我们还介绍一种用于将现有对象检测数据集转换为特定的开放式数据集的方法,以评估对象检测中的开放式性能。
translated by 谷歌翻译
A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a naturallanguage navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visuallygrounded navigation instructions, we present the Matter-port3D Simulator -a large-scale reinforcement learning environment based on real imagery [11]. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -the Room-to-Room (R2R) dataset 1 .1 https://bringmeaspoon.org Instruction: Head upstairs and walk past the piano through an archway directly in front. Turn right when the hallway ends at pictures and table. Wait by the moose antlers hanging on the wall.
translated by 谷歌翻译
We develop Bayesian neural networks (BNNs) that permit to model generic nonlinearities and time variation for (possibly large sets of) macroeconomic and financial variables. From a methodological point of view, we allow for a general specification of networks that can be applied to either dense or sparse datasets, and combines various activation functions, a possibly very large number of neurons, and stochastic volatility (SV) for the error term. From a computational point of view, we develop fast and efficient estimation algorithms for the general BNNs we introduce. From an empirical point of view, we show both with simulated data and with a set of common macro and financial applications that our BNNs can be of practical use, particularly so for observations in the tails of the cross-sectional or time series distributions of the target variables.
translated by 谷歌翻译